Overview

Brought to you by YData

Dataset statistics

Number of variables41
Number of observations74250
Missing cells58358
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory23.8 MiB
Average record size in memory336.0 B

Variable types

Numeric10
DateTime1
Text7
Categorical21
Boolean2

Alerts

recorded_by has constant value "GeoData Consultants Ltd" Constant
public_meeting is highly imbalanced (56.2%) Imbalance
management_group is highly imbalanced (69.1%) Imbalance
water_quality is highly imbalanced (71.3%) Imbalance
quality_group is highly imbalanced (67.9%) Imbalance
funder has 4507 (6.1%) missing values Missing
installer has 4532 (6.1%) missing values Missing
public_meeting has 4155 (5.6%) missing values Missing
scheme_management has 4847 (6.5%) missing values Missing
scheme_name has 36052 (48.6%) missing values Missing
permit has 3793 (5.1%) missing values Missing
amount_tsh is highly skewed (γ1 = 56.37002144) Skewed
num_private is highly skewed (γ1 = 91.3269825) Skewed
id is uniformly distributed Uniform
id has unique values Unique
amount_tsh has 52049 (70.1%) zeros Zeros
gps_height has 25649 (34.5%) zeros Zeros
longitude has 2269 (3.1%) zeros Zeros
num_private has 73299 (98.7%) zeros Zeros
population has 26834 (36.1%) zeros Zeros
construction_year has 25969 (35.0%) zeros Zeros

Reproduction

Analysis started2025-04-26 21:27:31.145605
Analysis finished2025-04-26 21:27:36.338674
Duration5.19 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

id
Real number (ℝ)

Uniform  Unique 

Distinct74250
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37124.5
Minimum0
Maximum74249
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2025-04-26T17:27:36.358592image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3712.45
Q118562.25
median37124.5
Q355686.75
95-th percentile70536.55
Maximum74249
Range74249
Interquartile range (IQR)37124.5

Descriptive statistics

Standard deviation21434.273
Coefficient of variation (CV)0.57736193
Kurtosis-1.2
Mean37124.5
Median Absolute Deviation (MAD)18562.5
Skewness1.0257844 × 10-18
Sum2.7564941 × 109
Variance4.5942806 × 108
MonotonicityNot monotonic
2025-04-26T17:27:36.388086image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
69572 1
 
< 0.1%
51488 1
 
< 0.1%
40697 1
 
< 0.1%
40221 1
 
< 0.1%
68749 1
 
< 0.1%
30265 1
 
< 0.1%
17495 1
 
< 0.1%
38573 1
 
< 0.1%
56018 1
 
< 0.1%
67660 1
 
< 0.1%
Other values (74240) 74240
> 99.9%
ValueCountFrequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
ValueCountFrequency (%)
74249 1
< 0.1%
74248 1
< 0.1%
74247 1
< 0.1%
74246 1
< 0.1%
74245 1
< 0.1%
74244 1
< 0.1%
74243 1
< 0.1%
74242 1
< 0.1%
74241 1
< 0.1%
74240 1
< 0.1%

amount_tsh
Real number (ℝ)

Skewed  Zeros 

Distinct102
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean318.6857
Minimum0
Maximum350000
Zeros52049
Zeros (%)70.1%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2025-04-26T17:27:36.417291image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile1200
Maximum350000
Range350000
Interquartile range (IQR)20

Descriptive statistics

Standard deviation2906.7624
Coefficient of variation (CV)9.1210943
Kurtosis4766.5651
Mean318.6857
Median Absolute Deviation (MAD)0
Skewness56.370021
Sum23662414
Variance8449267.4
MonotonicityNot monotonic
2025-04-26T17:27:36.445911image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 52049
70.1%
500 3874
 
5.2%
50 3103
 
4.2%
1000 1858
 
2.5%
20 1812
 
2.4%
200 1516
 
2.0%
100 1034
 
1.4%
10 995
 
1.3%
30 929
 
1.3%
2000 882
 
1.2%
Other values (92) 6198
 
8.3%
ValueCountFrequency (%)
0 52049
70.1%
0.2 4
 
< 0.1%
0.25 1
 
< 0.1%
0.5 1
 
< 0.1%
1 3
 
< 0.1%
2 18
 
< 0.1%
3 1
 
< 0.1%
5 471
 
0.6%
6 231
 
0.3%
7 87
 
0.1%
ValueCountFrequency (%)
350000 1
 
< 0.1%
250000 1
 
< 0.1%
200000 2
 
< 0.1%
170000 1
 
< 0.1%
138000 1
 
< 0.1%
120000 1
 
< 0.1%
117000 7
< 0.1%
100000 4
< 0.1%
70000 2
 
< 0.1%
60000 2
 
< 0.1%
Distinct369
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
Minimum2001-03-26 00:00:00
Maximum2013-12-03 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-04-26T17:27:36.473804image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:36.505183image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

funder
Text

Missing 

Distinct2139
Distinct (%)3.1%
Missing4507
Missing (%)6.1%
Memory size1.1 MiB
2025-04-26T17:27:36.612816image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length30
Median length27
Mean length9.916264
Min length1

Characters and Unicode

Total characters691590
Distinct characters70
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1129 ?
Unique (%)1.6%

Sample

1st rowRoman
2nd rowGrumeti
3rd rowLottery Club
4th rowUnicef
5th rowAction In A
ValueCountFrequency (%)
of 12116
 
10.7%
government 11536
 
10.2%
tanzania 11406
 
10.1%
danida 3921
 
3.5%
world 3501
 
3.1%
water 3303
 
2.9%
hesawa 2783
 
2.5%
bank 1790
 
1.6%
kkkt 1732
 
1.5%
rwssp 1705
 
1.5%
Other values (2305) 59055
52.3%
2025-04-26T17:27:36.741860image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 85335
 
12.3%
n 72148
 
10.4%
i 47485
 
6.9%
e 46776
 
6.8%
43186
 
6.2%
r 34873
 
5.0%
t 28714
 
4.2%
o 28372
 
4.1%
s 21436
 
3.1%
d 19386
 
2.8%
Other values (60) 263879
38.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 691590
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 85335
 
12.3%
n 72148
 
10.4%
i 47485
 
6.9%
e 46776
 
6.8%
43186
 
6.2%
r 34873
 
5.0%
t 28714
 
4.2%
o 28372
 
4.1%
s 21436
 
3.1%
d 19386
 
2.8%
Other values (60) 263879
38.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 691590
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 85335
 
12.3%
n 72148
 
10.4%
i 47485
 
6.9%
e 46776
 
6.8%
43186
 
6.2%
r 34873
 
5.0%
t 28714
 
4.2%
o 28372
 
4.1%
s 21436
 
3.1%
d 19386
 
2.8%
Other values (60) 263879
38.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 691590
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 85335
 
12.3%
n 72148
 
10.4%
i 47485
 
6.9%
e 46776
 
6.8%
43186
 
6.2%
r 34873
 
5.0%
t 28714
 
4.2%
o 28372
 
4.1%
s 21436
 
3.1%
d 19386
 
2.8%
Other values (60) 263879
38.2%

gps_height
Real number (ℝ)

Zeros 

Distinct2456
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean665.66731
Minimum-90
Maximum2777
Zeros25649
Zeros (%)34.5%
Negative1881
Negative (%)2.5%
Memory size1.1 MiB
2025-04-26T17:27:36.767914image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum-90
5-th percentile0
Q10
median364
Q31317
95-th percentile1796
Maximum2777
Range2867
Interquartile range (IQR)1317

Descriptive statistics

Standard deviation692.76103
Coefficient of variation (CV)1.0407016
Kurtosis-1.2860423
Mean665.66731
Median Absolute Deviation (MAD)364
Skewness0.46929439
Sum49425798
Variance479917.85
MonotonicityNot monotonic
2025-04-26T17:27:36.795094image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 25649
34.5%
-16 71
 
0.1%
-15 69
 
0.1%
-13 68
 
0.1%
-19 65
 
0.1%
-14 64
 
0.1%
1290 60
 
0.1%
-18 60
 
0.1%
303 59
 
0.1%
-20 58
 
0.1%
Other values (2446) 48027
64.7%
ValueCountFrequency (%)
-90 1
 
< 0.1%
-63 2
< 0.1%
-59 1
 
< 0.1%
-57 2
< 0.1%
-56 1
 
< 0.1%
-55 1
 
< 0.1%
-54 1
 
< 0.1%
-53 1
 
< 0.1%
-52 2
< 0.1%
-51 3
< 0.1%
ValueCountFrequency (%)
2777 1
< 0.1%
2770 1
< 0.1%
2628 1
< 0.1%
2627 1
< 0.1%
2626 2
< 0.1%
2623 1
< 0.1%
2614 1
< 0.1%
2585 1
< 0.1%
2576 2
< 0.1%
2569 1
< 0.1%

installer
Text

Missing 

Distinct2410
Distinct (%)3.5%
Missing4532
Missing (%)6.1%
Memory size1.1 MiB
2025-04-26T17:27:36.882806image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length30
Median length29
Mean length6.0973063
Min length1

Characters and Unicode

Total characters425092
Distinct characters71
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1245 ?
Unique (%)1.8%

Sample

1st rowRoman
2nd rowGRUMETI
3rd rowWorld vision
4th rowUNICEF
5th rowArtisan
ValueCountFrequency (%)
dwe 22004
25.8%
government 3450
 
4.0%
water 2301
 
2.7%
hesawa 1768
 
2.1%
rwe 1526
 
1.8%
district 1491
 
1.7%
kkkt 1445
 
1.7%
council 1356
 
1.6%
commu 1354
 
1.6%
danida 1307
 
1.5%
Other values (2191) 47268
55.4%
2025-04-26T17:27:36.996928image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
D 34447
 
8.1%
W 32323
 
7.6%
E 31711
 
7.5%
a 21693
 
5.1%
n 20670
 
4.9%
e 19282
 
4.5%
i 18760
 
4.4%
A 17012
 
4.0%
r 16604
 
3.9%
t 15918
 
3.7%
Other values (61) 196672
46.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 425092
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
D 34447
 
8.1%
W 32323
 
7.6%
E 31711
 
7.5%
a 21693
 
5.1%
n 20670
 
4.9%
e 19282
 
4.5%
i 18760
 
4.4%
A 17012
 
4.0%
r 16604
 
3.9%
t 15918
 
3.7%
Other values (61) 196672
46.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 425092
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
D 34447
 
8.1%
W 32323
 
7.6%
E 31711
 
7.5%
a 21693
 
5.1%
n 20670
 
4.9%
e 19282
 
4.5%
i 18760
 
4.4%
A 17012
 
4.0%
r 16604
 
3.9%
t 15918
 
3.7%
Other values (61) 196672
46.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 425092
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
D 34447
 
8.1%
W 32323
 
7.6%
E 31711
 
7.5%
a 21693
 
5.1%
n 20670
 
4.9%
e 19282
 
4.5%
i 18760
 
4.4%
A 17012
 
4.0%
r 16604
 
3.9%
t 15918
 
3.7%
Other values (61) 196672
46.3%

longitude
Real number (ℝ)

Zeros 

Distinct71870
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.074262
Minimum0
Maximum40.345193
Zeros2269
Zeros (%)3.1%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2025-04-26T17:27:37.023253image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile30.043194
Q133.086819
median34.907475
Q337.181685
95-th percentile39.13025
Maximum40.345193
Range40.345193
Interquartile range (IQR)4.094866

Descriptive statistics

Standard deviation6.5725188
Coefficient of variation (CV)0.19288807
Kurtosis19.148748
Mean34.074262
Median Absolute Deviation (MAD)2.0389258
Skewness-4.187363
Sum2530014
Variance43.198004
MonotonicityNot monotonic
2025-04-26T17:27:37.051323image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2269
 
3.1%
37.5320952 2
 
< 0.1%
32.99387218 2
 
< 0.1%
38.34050134 2
 
< 0.1%
37.54080503 2
 
< 0.1%
32.9936827 2
 
< 0.1%
32.9780624 2
 
< 0.1%
39.10375198 2
 
< 0.1%
39.09206155 2
 
< 0.1%
39.08843697 2
 
< 0.1%
Other values (71860) 71963
96.9%
ValueCountFrequency (%)
0 2269
3.1%
29.6071219 1
 
< 0.1%
29.60720109 1
 
< 0.1%
29.61032056 1
 
< 0.1%
29.61096482 1
 
< 0.1%
29.61194674 1
 
< 0.1%
29.61250689 1
 
< 0.1%
29.61276296 1
 
< 0.1%
29.61277618 1
 
< 0.1%
29.61344309 1
 
< 0.1%
ValueCountFrequency (%)
40.34519307 1
< 0.1%
40.34430089 1
< 0.1%
40.32523996 1
< 0.1%
40.32522643 1
< 0.1%
40.32501564 1
< 0.1%
40.32340181 1
< 0.1%
40.32283237 1
< 0.1%
40.32280453 1
< 0.1%
40.3226251 1
< 0.1%
40.32216902 1
< 0.1%

latitude
Real number (ℝ)

Distinct71869
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.701771
Minimum-11.64944
Maximum-2 × 10-8
Zeros0
Zeros (%)0.0%
Negative74250
Negative (%)100.0%
Memory size1.1 MiB
2025-04-26T17:27:37.078600image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum-11.64944
5-th percentile-10.586484
Q1-8.525675
median-5.0265399
Q3-3.3250579
95-th percentile-1.4081268
Maximum-2 × 10-8
Range11.64944
Interquartile range (IQR)5.2006171

Descriptive statistics

Standard deviation2.9449691
Coefficient of variation (CV)-0.51650077
Kurtosis-1.0542077
Mean-5.701771
Median Absolute Deviation (MAD)2.0688622
Skewness-0.15288081
Sum-423356.5
Variance8.6728431
MonotonicityNot monotonic
2025-04-26T17:27:37.106612image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-2 × 10-82269
 
3.1%
-2.49645868 2
 
< 0.1%
-7.05637235 2
 
< 0.1%
-6.98602609 2
 
< 0.1%
-6.95674564 2
 
< 0.1%
-2.49454559 2
 
< 0.1%
-2.490689 2
 
< 0.1%
-2.51661892 2
 
< 0.1%
-2.48004347 2
 
< 0.1%
-7.17908174 2
 
< 0.1%
Other values (71859) 71963
96.9%
ValueCountFrequency (%)
-11.64944018 1
< 0.1%
-11.64837759 1
< 0.1%
-11.58629656 1
< 0.1%
-11.56857679 1
< 0.1%
-11.56680457 1
< 0.1%
-11.56459195 1
< 0.1%
-11.56450865 1
< 0.1%
-11.56432357 1
< 0.1%
-11.56231592 1
< 0.1%
-11.56228898 1
< 0.1%
ValueCountFrequency (%)
-2 × 10-82269
3.1%
-0.99846435 1
 
< 0.1%
-0.99875229 1
 
< 0.1%
-0.998916 1
 
< 0.1%
-0.99901209 1
 
< 0.1%
-0.99911702 1
 
< 0.1%
-0.9994692 1
 
< 0.1%
-0.99950651 1
 
< 0.1%
-0.99952232 1
 
< 0.1%
-1.00058519 1
 
< 0.1%
Distinct45683
Distinct (%)61.5%
Missing2
Missing (%)< 0.1%
Memory size1.1 MiB
2025-04-26T17:27:37.183390image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length30
Median length25
Mean length10.977171
Min length1

Characters and Unicode

Total characters815033
Distinct characters76
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39882 ?
Unique (%)53.7%

Sample

1st rownone
2nd rowZahanati
3rd rowKwa Mahundi
4th rowZahanati Ya Nanyumbu
5th rowShuleni
ValueCountFrequency (%)
kwa 26774
 
19.6%
none 4440
 
3.2%
mzee 4264
 
3.1%
shuleni 2696
 
2.0%
ya 1865
 
1.4%
shule 1755
 
1.3%
school 1403
 
1.0%
primary 1335
 
1.0%
zahanati 1231
 
0.9%
msingi 1102
 
0.8%
Other values (34870) 89851
65.7%
2025-04-26T17:27:37.283895image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 123688
15.2%
i 65417
 
8.0%
62473
 
7.7%
n 52634
 
6.5%
e 51422
 
6.3%
w 39672
 
4.9%
K 39197
 
4.8%
o 37761
 
4.6%
u 30433
 
3.7%
M 27612
 
3.4%
Other values (66) 284724
34.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 815033
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 123688
15.2%
i 65417
 
8.0%
62473
 
7.7%
n 52634
 
6.5%
e 51422
 
6.3%
w 39672
 
4.9%
K 39197
 
4.8%
o 37761
 
4.6%
u 30433
 
3.7%
M 27612
 
3.4%
Other values (66) 284724
34.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 815033
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 123688
15.2%
i 65417
 
8.0%
62473
 
7.7%
n 52634
 
6.5%
e 51422
 
6.3%
w 39672
 
4.9%
K 39197
 
4.8%
o 37761
 
4.6%
u 30433
 
3.7%
M 27612
 
3.4%
Other values (66) 284724
34.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 815033
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 123688
15.2%
i 65417
 
8.0%
62473
 
7.7%
n 52634
 
6.5%
e 51422
 
6.3%
w 39672
 
4.9%
K 39197
 
4.8%
o 37761
 
4.6%
u 30433
 
3.7%
M 27612
 
3.4%
Other values (66) 284724
34.9%

num_private
Real number (ℝ)

Skewed  Zeros 

Distinct68
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.46232997
Minimum0
Maximum1776
Zeros73299
Zeros (%)98.7%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2025-04-26T17:27:37.308778image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1776
Range1776
Interquartile range (IQR)0

Descriptive statistics

Standard deviation11.537879
Coefficient of variation (CV)24.955939
Kurtosis11449.87
Mean0.46232997
Median Absolute Deviation (MAD)0
Skewness91.326983
Sum34328
Variance133.12264
MonotonicityNot monotonic
2025-04-26T17:27:37.337520image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 73299
98.7%
1 94
 
0.1%
6 92
 
0.1%
5 60
 
0.1%
8 58
 
0.1%
15 47
 
0.1%
32 45
 
0.1%
45 41
 
0.1%
3 38
 
0.1%
93 37
 
< 0.1%
Other values (58) 439
 
0.6%
ValueCountFrequency (%)
0 73299
98.7%
1 94
 
0.1%
2 31
 
< 0.1%
3 38
 
0.1%
4 30
 
< 0.1%
5 60
 
0.1%
6 92
 
0.1%
7 31
 
< 0.1%
8 58
 
0.1%
9 4
 
< 0.1%
ValueCountFrequency (%)
1776 1
< 0.1%
1402 1
< 0.1%
755 1
< 0.1%
698 1
< 0.1%
672 1
< 0.1%
669 1
< 0.1%
668 1
< 0.1%
450 1
< 0.1%
420 1
< 0.1%
300 1
< 0.1%

basin
Categorical

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
Lake Victoria
12871 
Pangani
11143 
Rufiji
9987 
Internal
9642 
Lake Tanganyika
8052 
Other values (4)
22555 

Length

Max length23
Median length11
Mean length10.894545
Min length6

Characters and Unicode

Total characters808920
Distinct characters32
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLake Nyasa
2nd rowLake Victoria
3rd rowPangani
4th rowRuvuma / Southern Coast
5th rowLake Victoria

Common Values

ValueCountFrequency (%)
Lake Victoria 12871
17.3%
Pangani 11143
15.0%
Rufiji 9987
13.5%
Internal 9642
13.0%
Lake Tanganyika 8052
10.8%
Wami / Ruvu 7577
10.2%
Lake Nyasa 6332
8.5%
Ruvuma / Southern Coast 5587
7.5%
Lake Rukwa 3059
 
4.1%

Length

2025-04-26T17:27:37.366405image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:37.388124image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
lake 30314
22.2%
13164
9.6%
victoria 12871
9.4%
pangani 11143
 
8.2%
rufiji 9987
 
7.3%
internal 9642
 
7.1%
tanganyika 8052
 
5.9%
wami 7577
 
5.6%
ruvu 7577
 
5.6%
nyasa 6332
 
4.6%
Other values (4) 19820
14.5%

Most occurring characters

ValueCountFrequency (%)
a 133743
16.5%
i 72488
 
9.0%
n 63261
 
7.8%
62229
 
7.7%
e 45543
 
5.6%
u 44961
 
5.6%
k 41425
 
5.1%
t 33687
 
4.2%
L 30314
 
3.7%
r 28100
 
3.5%
Other values (22) 253169
31.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 808920
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 133743
16.5%
i 72488
 
9.0%
n 63261
 
7.8%
62229
 
7.7%
e 45543
 
5.6%
u 44961
 
5.6%
k 41425
 
5.1%
t 33687
 
4.2%
L 30314
 
3.7%
r 28100
 
3.5%
Other values (22) 253169
31.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 808920
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 133743
16.5%
i 72488
 
9.0%
n 63261
 
7.8%
62229
 
7.7%
e 45543
 
5.6%
u 44961
 
5.6%
k 41425
 
5.1%
t 33687
 
4.2%
L 30314
 
3.7%
r 28100
 
3.5%
Other values (22) 253169
31.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 808920
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 133743
16.5%
i 72488
 
9.0%
n 63261
 
7.8%
62229
 
7.7%
e 45543
 
5.6%
u 44961
 
5.6%
k 41425
 
5.1%
t 33687
 
4.2%
L 30314
 
3.7%
r 28100
 
3.5%
Other values (22) 253169
31.3%
Distinct21425
Distinct (%)29.0%
Missing470
Missing (%)0.6%
Memory size1.1 MiB
2025-04-26T17:27:37.470907image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length30
Median length27
Mean length7.898997
Min length1

Characters and Unicode

Total characters582788
Distinct characters73
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9752 ?
Unique (%)13.2%

Sample

1st rowMnyusi B
2nd rowNyamara
3rd rowMajengo
4th rowMahakamani
5th rowKyanyamisa
ValueCountFrequency (%)
a 3016
 
3.4%
b 2524
 
2.9%
kati 2351
 
2.7%
majengo 768
 
0.9%
wa 762
 
0.9%
shuleni 754
 
0.9%
madukani 709
 
0.8%
mtaa 656
 
0.7%
juu 504
 
0.6%
mjini 458
 
0.5%
Other values (18756) 76014
85.9%
2025-04-26T17:27:37.574185image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 90081
15.5%
i 56910
 
9.8%
n 41876
 
7.2%
u 32997
 
5.7%
e 32135
 
5.5%
o 29502
 
5.1%
M 25477
 
4.4%
g 23754
 
4.1%
l 20522
 
3.5%
m 18839
 
3.2%
Other values (63) 210695
36.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 582788
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 90081
15.5%
i 56910
 
9.8%
n 41876
 
7.2%
u 32997
 
5.7%
e 32135
 
5.5%
o 29502
 
5.1%
M 25477
 
4.4%
g 23754
 
4.1%
l 20522
 
3.5%
m 18839
 
3.2%
Other values (63) 210695
36.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 582788
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 90081
15.5%
i 56910
 
9.8%
n 41876
 
7.2%
u 32997
 
5.7%
e 32135
 
5.5%
o 29502
 
5.1%
M 25477
 
4.4%
g 23754
 
4.1%
l 20522
 
3.5%
m 18839
 
3.2%
Other values (63) 210695
36.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 582788
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 90081
15.5%
i 56910
 
9.8%
n 41876
 
7.2%
u 32997
 
5.7%
e 32135
 
5.5%
o 29502
 
5.1%
M 25477
 
4.4%
g 23754
 
4.1%
l 20522
 
3.5%
m 18839
 
3.2%
Other values (63) 210695
36.2%

region
Categorical

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
Iringa
6599 
Shinyanga
6293 
Mbeya
5758 
Kilimanjaro
5494 
Morogoro
5038 
Other values (16)
45068 

Length

Max length13
Median length11
Mean length6.6294141
Min length4

Characters and Unicode

Total characters492234
Distinct characters32
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIringa
2nd rowMara
3rd rowManyara
4th rowMtwara
5th rowKagera

Common Values

ValueCountFrequency (%)
Iringa 6599
 
8.9%
Shinyanga 6293
 
8.5%
Mbeya 5758
 
7.8%
Kilimanjaro 5494
 
7.4%
Morogoro 5038
 
6.8%
Kagera 4174
 
5.6%
Arusha 4111
 
5.5%
Mwanza 3897
 
5.2%
Kigoma 3533
 
4.8%
Pwani 3331
 
4.5%
Other values (11) 26022
35.0%

Length

2025-04-26T17:27:37.598170image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
iringa 6599
 
8.6%
shinyanga 6293
 
8.2%
mbeya 5758
 
7.5%
kilimanjaro 5494
 
7.2%
morogoro 5038
 
6.6%
kagera 4174
 
5.5%
arusha 4111
 
5.4%
mwanza 3897
 
5.1%
kigoma 3533
 
4.6%
pwani 3331
 
4.4%
Other values (13) 28062
36.8%

Most occurring characters

ValueCountFrequency (%)
a 104401
21.2%
n 41521
 
8.4%
r 40507
 
8.2%
i 39656
 
8.1%
o 37203
 
7.6%
g 31359
 
6.4%
M 21260
 
4.3%
m 16132
 
3.3%
y 14023
 
2.8%
K 13201
 
2.7%
Other values (22) 132971
27.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 492234
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 104401
21.2%
n 41521
 
8.4%
r 40507
 
8.2%
i 39656
 
8.1%
o 37203
 
7.6%
g 31359
 
6.4%
M 21260
 
4.3%
m 16132
 
3.3%
y 14023
 
2.8%
K 13201
 
2.7%
Other values (22) 132971
27.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 492234
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 104401
21.2%
n 41521
 
8.4%
r 40507
 
8.2%
i 39656
 
8.1%
o 37203
 
7.6%
g 31359
 
6.4%
M 21260
 
4.3%
m 16132
 
3.3%
y 14023
 
2.8%
K 13201
 
2.7%
Other values (22) 132971
27.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 492234
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 104401
21.2%
n 41521
 
8.4%
r 40507
 
8.2%
i 39656
 
8.1%
o 37203
 
7.6%
g 31359
 
6.4%
M 21260
 
4.3%
m 16132
 
3.3%
y 14023
 
2.8%
K 13201
 
2.7%
Other values (22) 132971
27.0%

region_code
Real number (ℝ)

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.265414
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2025-04-26T17:27:37.618411image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q15
median12
Q317
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)12

Descriptive statistics

Standard deviation17.508907
Coefficient of variation (CV)1.1469657
Kurtosis10.354697
Mean15.265414
Median Absolute Deviation (MAD)6
Skewness3.1794543
Sum1133457
Variance306.56182
MonotonicityNot monotonic
2025-04-26T17:27:37.641639image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
11 6608
 
8.9%
17 6334
 
8.5%
12 5759
 
7.8%
3 5494
 
7.4%
5 5079
 
6.8%
18 4183
 
5.6%
19 3824
 
5.2%
2 3709
 
5.0%
16 3533
 
4.8%
10 3306
 
4.5%
Other values (17) 26421
35.6%
ValueCountFrequency (%)
1 2779
3.7%
2 3709
5.0%
3 5494
7.4%
4 3145
4.2%
5 5079
6.8%
6 2032
 
2.7%
7 1020
 
1.4%
8 375
 
0.5%
9 499
 
0.7%
10 3306
4.5%
ValueCountFrequency (%)
99 512
 
0.7%
90 1133
 
1.5%
80 1536
 
2.1%
60 1298
 
1.7%
40 1
 
< 0.1%
24 402
 
0.5%
21 1972
2.7%
20 2451
3.3%
19 3824
5.2%
18 4183
5.6%

district_code
Real number (ℝ)

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.6290774
Minimum0
Maximum80
Zeros27
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2025-04-26T17:27:37.662337image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q35
95-th percentile30
Maximum80
Range80
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.6416356
Coefficient of variation (CV)1.712827
Kurtosis16.191722
Mean5.6290774
Median Absolute Deviation (MAD)1
Skewness3.9614329
Sum417959
Variance92.961136
MonotonicityNot monotonic
2025-04-26T17:27:37.683114image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
1 15299
20.6%
2 13929
18.8%
3 12521
16.9%
4 11253
15.2%
5 5428
 
7.3%
6 5108
 
6.9%
7 4166
 
5.6%
8 1282
 
1.7%
30 1256
 
1.7%
33 1063
 
1.4%
Other values (10) 2945
 
4.0%
ValueCountFrequency (%)
0 27
 
< 0.1%
1 15299
20.6%
2 13929
18.8%
3 12521
16.9%
4 11253
15.2%
5 5428
 
7.3%
6 5108
 
6.9%
7 4166
 
5.6%
8 1282
 
1.7%
13 496
 
0.7%
ValueCountFrequency (%)
80 13
 
< 0.1%
67 8
 
< 0.1%
63 264
 
0.4%
62 127
 
0.2%
60 76
 
0.1%
53 921
1.2%
43 653
0.9%
33 1063
1.4%
30 1256
1.7%
23 360
 
0.5%

lga
Text

Distinct125
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
2025-04-26T17:27:37.756470image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length16
Median length14
Mean length7.4073805
Min length3

Characters and Unicode

Total characters549998
Distinct characters41
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLudewa
2nd rowSerengeti
3rd rowSimanjiro
4th rowNanyumbu
5th rowKaragwe
ValueCountFrequency (%)
rural 11814
 
13.4%
njombe 3128
 
3.5%
urban 2118
 
2.4%
moshi 1669
 
1.9%
arusha 1603
 
1.8%
bariadi 1485
 
1.7%
singida 1410
 
1.6%
rungwe 1381
 
1.6%
kilosa 1368
 
1.6%
kasulu 1322
 
1.5%
Other values (106) 60884
69.0%
2025-04-26T17:27:37.854774image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 87352
15.9%
o 37693
 
6.9%
i 36767
 
6.7%
u 35252
 
6.4%
r 33487
 
6.1%
e 28292
 
5.1%
n 28081
 
5.1%
l 23976
 
4.4%
g 22965
 
4.2%
M 19956
 
3.6%
Other values (31) 196177
35.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 549998
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 87352
15.9%
o 37693
 
6.9%
i 36767
 
6.7%
u 35252
 
6.4%
r 33487
 
6.1%
e 28292
 
5.1%
n 28081
 
5.1%
l 23976
 
4.4%
g 22965
 
4.2%
M 19956
 
3.6%
Other values (31) 196177
35.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 549998
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 87352
15.9%
o 37693
 
6.9%
i 36767
 
6.7%
u 35252
 
6.4%
r 33487
 
6.1%
e 28292
 
5.1%
n 28081
 
5.1%
l 23976
 
4.4%
g 22965
 
4.2%
M 19956
 
3.6%
Other values (31) 196177
35.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 549998
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 87352
15.9%
o 37693
 
6.9%
i 36767
 
6.7%
u 35252
 
6.4%
r 33487
 
6.1%
e 28292
 
5.1%
n 28081
 
5.1%
l 23976
 
4.4%
g 22965
 
4.2%
M 19956
 
3.6%
Other values (31) 196177
35.7%

ward
Text

Distinct2098
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
2025-04-26T17:27:37.946767image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length23
Median length19
Mean length7.5064242
Min length3

Characters and Unicode

Total characters557352
Distinct characters54
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)< 0.1%

Sample

1st rowMundindi
2nd rowNatta
3rd rowNgorika
4th rowNanyumbu
5th rowNyakasimbi
ValueCountFrequency (%)
mashariki 720
 
0.9%
urban 666
 
0.8%
siha 550
 
0.7%
kusini 488
 
0.6%
magharibi 472
 
0.6%
igosi 386
 
0.5%
masama 382
 
0.5%
machame 363
 
0.4%
kati 342
 
0.4%
imalinyi 318
 
0.4%
Other values (2112) 76252
94.2%
2025-04-26T17:27:38.067286image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 86986
15.6%
i 50510
 
9.1%
n 36882
 
6.6%
u 33914
 
6.1%
o 32443
 
5.8%
e 29383
 
5.3%
g 26356
 
4.7%
M 23580
 
4.2%
m 20301
 
3.6%
l 19770
 
3.5%
Other values (44) 197227
35.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 557352
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 86986
15.6%
i 50510
 
9.1%
n 36882
 
6.6%
u 33914
 
6.1%
o 32443
 
5.8%
e 29383
 
5.3%
g 26356
 
4.7%
M 23580
 
4.2%
m 20301
 
3.6%
l 19770
 
3.5%
Other values (44) 197227
35.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 557352
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 86986
15.6%
i 50510
 
9.1%
n 36882
 
6.6%
u 33914
 
6.1%
o 32443
 
5.8%
e 29383
 
5.3%
g 26356
 
4.7%
M 23580
 
4.2%
m 20301
 
3.6%
l 19770
 
3.5%
Other values (44) 197227
35.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 557352
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 86986
15.6%
i 50510
 
9.1%
n 36882
 
6.6%
u 33914
 
6.1%
o 32443
 
5.8%
e 29383
 
5.3%
g 26356
 
4.7%
M 23580
 
4.2%
m 20301
 
3.6%
l 19770
 
3.5%
Other values (44) 197227
35.4%

population
Real number (ℝ)

Zeros 

Distinct1128
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean180.75083
Minimum0
Maximum30500
Zeros26834
Zeros (%)36.1%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2025-04-26T17:27:38.093447image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median25
Q3215
95-th percentile690
Maximum30500
Range30500
Interquartile range (IQR)215

Descriptive statistics

Standard deviation471.08612
Coefficient of variation (CV)2.6062736
Kurtosis343.36556
Mean180.75083
Median Absolute Deviation (MAD)25
Skewness11.780615
Sum13420749
Variance221922.13
MonotonicityNot monotonic
2025-04-26T17:27:38.121058image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 26834
36.1%
1 8782
 
11.8%
200 2370
 
3.2%
150 2328
 
3.1%
250 2087
 
2.8%
300 1842
 
2.5%
50 1437
 
1.9%
100 1419
 
1.9%
500 1274
 
1.7%
350 1252
 
1.7%
Other values (1118) 24625
33.2%
ValueCountFrequency (%)
0 26834
36.1%
1 8782
 
11.8%
2 9
 
< 0.1%
3 6
 
< 0.1%
4 15
 
< 0.1%
5 50
 
0.1%
6 27
 
< 0.1%
7 3
 
< 0.1%
8 29
 
< 0.1%
9 12
 
< 0.1%
ValueCountFrequency (%)
30500 1
 
< 0.1%
15300 1
 
< 0.1%
11469 1
 
< 0.1%
11463 1
 
< 0.1%
10000 3
< 0.1%
9865 1
 
< 0.1%
9800 1
 
< 0.1%
9500 1
 
< 0.1%
9000 4
< 0.1%
8848 1
 
< 0.1%

public_meeting
Boolean

Imbalance  Missing 

Distinct2
Distinct (%)< 0.1%
Missing4155
Missing (%)5.6%
Memory size1.1 MiB
True
63749 
False
 
6346
(Missing)
 
4155
ValueCountFrequency (%)
True 63749
85.9%
False 6346
 
8.5%
(Missing) 4155
 
5.6%
2025-04-26T17:27:38.138473image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

recorded_by
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
GeoData Consultants Ltd
74250 

Length

Max length23
Median length23
Mean length23
Min length23

Characters and Unicode

Total characters1707750
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGeoData Consultants Ltd
2nd rowGeoData Consultants Ltd
3rd rowGeoData Consultants Ltd
4th rowGeoData Consultants Ltd
5th rowGeoData Consultants Ltd

Common Values

ValueCountFrequency (%)
GeoData Consultants Ltd 74250
100.0%

Length

2025-04-26T17:27:38.155661image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:38.168515image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
geodata 74250
33.3%
consultants 74250
33.3%
ltd 74250
33.3%

Most occurring characters

ValueCountFrequency (%)
t 297000
17.4%
a 222750
13.0%
o 148500
8.7%
148500
8.7%
n 148500
8.7%
s 148500
8.7%
G 74250
 
4.3%
e 74250
 
4.3%
D 74250
 
4.3%
C 74250
 
4.3%
Other values (4) 297000
17.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1707750
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 297000
17.4%
a 222750
13.0%
o 148500
8.7%
148500
8.7%
n 148500
8.7%
s 148500
8.7%
G 74250
 
4.3%
e 74250
 
4.3%
D 74250
 
4.3%
C 74250
 
4.3%
Other values (4) 297000
17.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1707750
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 297000
17.4%
a 222750
13.0%
o 148500
8.7%
148500
8.7%
n 148500
8.7%
s 148500
8.7%
G 74250
 
4.3%
e 74250
 
4.3%
D 74250
 
4.3%
C 74250
 
4.3%
Other values (4) 297000
17.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1707750
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 297000
17.4%
a 222750
13.0%
o 148500
8.7%
148500
8.7%
n 148500
8.7%
s 148500
8.7%
G 74250
 
4.3%
e 74250
 
4.3%
D 74250
 
4.3%
C 74250
 
4.3%
Other values (4) 297000
17.4%

scheme_management
Categorical

Missing 

Distinct11
Distinct (%)< 0.1%
Missing4847
Missing (%)6.5%
Memory size1.1 MiB
VWC
45917 
WUG
6496 
Water authority
 
3975
WUA
 
3551
Water Board
 
3462
Other values (6)
6002 

Length

Max length16
Median length3
Mean length4.6575941
Min length3

Characters and Unicode

Total characters323251
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowVWC
2nd rowOther
3rd rowVWC
4th rowVWC
5th rowVWC

Common Values

ValueCountFrequency (%)
VWC 45917
61.8%
WUG 6496
 
8.7%
Water authority 3975
 
5.4%
WUA 3551
 
4.8%
Water Board 3462
 
4.7%
Parastatal 2124
 
2.9%
Company 1341
 
1.8%
Private operator 1326
 
1.8%
Other 996
 
1.3%
SWC 123
 
0.2%
(Missing) 4847
 
6.5%

Length

2025-04-26T17:27:38.185826image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vwc 45917
58.7%
water 7437
 
9.5%
wug 6496
 
8.3%
authority 3975
 
5.1%
wua 3551
 
4.5%
board 3462
 
4.4%
parastatal 2124
 
2.7%
company 1341
 
1.7%
private 1326
 
1.7%
operator 1326
 
1.7%
Other values (3) 1211
 
1.5%

Most occurring characters

ValueCountFrequency (%)
W 63524
19.7%
C 47381
14.7%
V 45917
14.2%
a 27363
8.5%
t 23375
 
7.2%
r 22064
 
6.8%
o 11430
 
3.5%
e 11085
 
3.4%
U 10047
 
3.1%
8763
 
2.7%
Other values (18) 52302
16.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 323251
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
W 63524
19.7%
C 47381
14.7%
V 45917
14.2%
a 27363
8.5%
t 23375
 
7.2%
r 22064
 
6.8%
o 11430
 
3.5%
e 11085
 
3.4%
U 10047
 
3.1%
8763
 
2.7%
Other values (18) 52302
16.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 323251
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
W 63524
19.7%
C 47381
14.7%
V 45917
14.2%
a 27363
8.5%
t 23375
 
7.2%
r 22064
 
6.8%
o 11430
 
3.5%
e 11085
 
3.4%
U 10047
 
3.1%
8763
 
2.7%
Other values (18) 52302
16.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 323251
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
W 63524
19.7%
C 47381
14.7%
V 45917
14.2%
a 27363
8.5%
t 23375
 
7.2%
r 22064
 
6.8%
o 11430
 
3.5%
e 11085
 
3.4%
U 10047
 
3.1%
8763
 
2.7%
Other values (18) 52302
16.2%

scheme_name
Text

Missing 

Distinct2867
Distinct (%)7.5%
Missing36052
Missing (%)48.6%
Memory size1.1 MiB
2025-04-26T17:27:38.276877image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length46
Median length37
Mean length14.487539
Min length1

Characters and Unicode

Total characters553395
Distinct characters68
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique752 ?
Unique (%)2.0%

Sample

1st rowRoman
2nd rowNyumba ya mungu pipe scheme
3rd rowZingibali
4th rowBL Bondeni
5th rowwanging'ombe water supply s
ValueCountFrequency (%)
water 12153
 
13.7%
supply 8382
 
9.4%
scheme 3152
 
3.5%
wa 2693
 
3.0%
gravity 2356
 
2.7%
maji 1668
 
1.9%
pipe 1640
 
1.8%
mradi 1371
 
1.5%
line 1225
 
1.4%
supplied 1091
 
1.2%
Other values (2623) 53165
59.8%
2025-04-26T17:27:38.400627image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 60494
 
10.9%
51290
 
9.3%
e 43069
 
7.8%
i 32776
 
5.9%
p 27880
 
5.0%
r 27119
 
4.9%
t 23804
 
4.3%
u 23019
 
4.2%
l 21577
 
3.9%
n 21341
 
3.9%
Other values (58) 221026
39.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 553395
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 60494
 
10.9%
51290
 
9.3%
e 43069
 
7.8%
i 32776
 
5.9%
p 27880
 
5.0%
r 27119
 
4.9%
t 23804
 
4.3%
u 23019
 
4.2%
l 21577
 
3.9%
n 21341
 
3.9%
Other values (58) 221026
39.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 553395
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 60494
 
10.9%
51290
 
9.3%
e 43069
 
7.8%
i 32776
 
5.9%
p 27880
 
5.0%
r 27119
 
4.9%
t 23804
 
4.3%
u 23019
 
4.2%
l 21577
 
3.9%
n 21341
 
3.9%
Other values (58) 221026
39.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 553395
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 60494
 
10.9%
51290
 
9.3%
e 43069
 
7.8%
i 32776
 
5.9%
p 27880
 
5.0%
r 27119
 
4.9%
t 23804
 
4.3%
u 23019
 
4.2%
l 21577
 
3.9%
n 21341
 
3.9%
Other values (58) 221026
39.9%

permit
Boolean

Missing 

Distinct2
Distinct (%)< 0.1%
Missing3793
Missing (%)5.1%
Memory size1.1 MiB
True
48606 
False
21851 
(Missing)
 
3793
ValueCountFrequency (%)
True 48606
65.5%
False 21851
29.4%
(Missing) 3793
 
5.1%
2025-04-26T17:27:38.417228image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

construction_year
Real number (ℝ)

Zeros 

Distinct55
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1298.4636
Minimum0
Maximum2013
Zeros25969
Zeros (%)35.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2025-04-26T17:27:38.437033image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1986
Q32004
95-th percentile2010
Maximum2013
Range2013
Interquartile range (IQR)2004

Descriptive statistics

Standard deviation952.34938
Coefficient of variation (CV)0.73344323
Kurtosis-1.6029319
Mean1298.4636
Median Absolute Deviation (MAD)22
Skewness-0.62978407
Sum96410926
Variance906969.33
MonotonicityNot monotonic
2025-04-26T17:27:38.465579image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 25969
35.0%
2010 3314
 
4.5%
2008 3243
 
4.4%
2009 3196
 
4.3%
2000 2578
 
3.5%
2007 1960
 
2.6%
2006 1892
 
2.5%
2011 1591
 
2.1%
2003 1579
 
2.1%
2004 1417
 
1.9%
Other values (45) 27511
37.1%
ValueCountFrequency (%)
0 25969
35.0%
1960 124
 
0.2%
1961 28
 
< 0.1%
1962 36
 
< 0.1%
1963 107
 
0.1%
1964 48
 
0.1%
1965 21
 
< 0.1%
1966 19
 
< 0.1%
1967 106
 
0.1%
1968 93
 
0.1%
ValueCountFrequency (%)
2013 209
 
0.3%
2012 1347
1.8%
2011 1591
2.1%
2010 3314
4.5%
2009 3196
4.3%
2008 3243
4.4%
2007 1960
2.6%
2006 1892
2.5%
2005 1275
 
1.7%
2004 1417
1.9%

extraction_type
Categorical

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
gravity
33263 
nira/tanira
10205 
other
8102 
submersible
5982 
swn 80
4588 
Other values (13)
12110 

Length

Max length25
Median length17
Mean length7.7207003
Min length3

Characters and Unicode

Total characters573262
Distinct characters29
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity 33263
44.8%
nira/tanira 10205
 
13.7%
other 8102
 
10.9%
submersible 5982
 
8.1%
swn 80 4588
 
6.2%
mono 3628
 
4.9%
india mark ii 3029
 
4.1%
afridev 2208
 
3.0%
ksb 1790
 
2.4%
other - rope pump 572
 
0.8%
Other values (8) 883
 
1.2%

Length

2025-04-26T17:27:38.492218image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gravity 33263
37.8%
nira/tanira 10205
 
11.6%
other 9061
 
10.3%
submersible 5982
 
6.8%
swn 4872
 
5.5%
80 4588
 
5.2%
mono 3628
 
4.1%
india 3164
 
3.6%
mark 3164
 
3.6%
ii 3029
 
3.4%
Other values (13) 7085
 
8.0%

Most occurring characters

ValueCountFrequency (%)
i 75123
13.1%
r 74660
13.0%
a 72622
12.7%
t 52529
9.2%
v 35471
 
6.2%
y 33366
 
5.8%
g 33265
 
5.8%
n 32230
 
5.6%
e 23913
 
4.2%
s 18628
 
3.2%
Other values (19) 121455
21.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 573262
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 75123
13.1%
r 74660
13.0%
a 72622
12.7%
t 52529
9.2%
v 35471
 
6.2%
y 33366
 
5.8%
g 33265
 
5.8%
n 32230
 
5.6%
e 23913
 
4.2%
s 18628
 
3.2%
Other values (19) 121455
21.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 573262
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 75123
13.1%
r 74660
13.0%
a 72622
12.7%
t 52529
9.2%
v 35471
 
6.2%
y 33366
 
5.8%
g 33265
 
5.8%
n 32230
 
5.6%
e 23913
 
4.2%
s 18628
 
3.2%
Other values (19) 121455
21.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 573262
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 75123
13.1%
r 74660
13.0%
a 72622
12.7%
t 52529
9.2%
v 35471
 
6.2%
y 33366
 
5.8%
g 33265
 
5.8%
n 32230
 
5.6%
e 23913
 
4.2%
s 18628
 
3.2%
Other values (19) 121455
21.2%
Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
gravity
33263 
nira/tanira
10205 
other
8102 
submersible
7772 
swn 80
4588 
Other values (8)
10320 

Length

Max length15
Median length14
Mean length7.8831785
Min length4

Characters and Unicode

Total characters585326
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity 33263
44.8%
nira/tanira 10205
 
13.7%
other 8102
 
10.9%
submersible 7772
 
10.5%
swn 80 4588
 
6.2%
mono 3628
 
4.9%
india mark ii 3029
 
4.1%
afridev 2208
 
3.0%
rope pump 572
 
0.8%
other handpump 447
 
0.6%
Other values (3) 436
 
0.6%

Length

2025-04-26T17:27:38.516005image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gravity 33263
38.5%
nira/tanira 10205
 
11.8%
other 8698
 
10.1%
submersible 7772
 
9.0%
swn 4588
 
5.3%
80 4588
 
5.3%
mono 3628
 
4.2%
mark 3164
 
3.7%
india 3164
 
3.7%
ii 3029
 
3.5%
Other values (7) 4235
 
4.9%

Most occurring characters

ValueCountFrequency (%)
i 76596
13.1%
r 76388
13.1%
a 72861
12.4%
t 52315
8.9%
v 35471
 
6.1%
g 33263
 
5.7%
y 33263
 
5.7%
n 32389
 
5.5%
e 27326
 
4.7%
s 20132
 
3.4%
Other values (16) 125322
21.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 585326
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 76596
13.1%
r 76388
13.1%
a 72861
12.4%
t 52315
8.9%
v 35471
 
6.1%
g 33263
 
5.7%
y 33263
 
5.7%
n 32389
 
5.5%
e 27326
 
4.7%
s 20132
 
3.4%
Other values (16) 125322
21.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 585326
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 76596
13.1%
r 76388
13.1%
a 72861
12.4%
t 52315
8.9%
v 35471
 
6.1%
g 33263
 
5.7%
y 33263
 
5.7%
n 32389
 
5.5%
e 27326
 
4.7%
s 20132
 
3.4%
Other values (16) 125322
21.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 585326
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 76596
13.1%
r 76388
13.1%
a 72861
12.4%
t 52315
8.9%
v 35471
 
6.1%
g 33263
 
5.7%
y 33263
 
5.7%
n 32389
 
5.5%
e 27326
 
4.7%
s 20132
 
3.4%
Other values (16) 125322
21.4%
Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
gravity
33263 
handpump
20612 
other
8102 
submersible
7772 
motorpump
3777 
Other values (2)
 
724

Length

Max length12
Median length11
Mean length7.6054411
Min length5

Characters and Unicode

Total characters564704
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity 33263
44.8%
handpump 20612
27.8%
other 8102
 
10.9%
submersible 7772
 
10.5%
motorpump 3777
 
5.1%
rope pump 572
 
0.8%
wind-powered 152
 
0.2%

Length

2025-04-26T17:27:38.539740image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:38.558683image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
gravity 33263
44.5%
handpump 20612
27.5%
other 8102
 
10.8%
submersible 7772
 
10.4%
motorpump 3777
 
5.0%
rope 572
 
0.8%
pump 572
 
0.8%
wind-powered 152
 
0.2%

Most occurring characters

ValueCountFrequency (%)
a 53875
 
9.5%
r 53638
 
9.5%
p 50646
 
9.0%
t 45142
 
8.0%
i 41187
 
7.3%
m 36510
 
6.5%
g 33263
 
5.9%
y 33263
 
5.9%
v 33263
 
5.9%
u 32733
 
5.8%
Other values (11) 151184
26.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 564704
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 53875
 
9.5%
r 53638
 
9.5%
p 50646
 
9.0%
t 45142
 
8.0%
i 41187
 
7.3%
m 36510
 
6.5%
g 33263
 
5.9%
y 33263
 
5.9%
v 33263
 
5.9%
u 32733
 
5.8%
Other values (11) 151184
26.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 564704
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 53875
 
9.5%
r 53638
 
9.5%
p 50646
 
9.0%
t 45142
 
8.0%
i 41187
 
7.3%
m 36510
 
6.5%
g 33263
 
5.9%
y 33263
 
5.9%
v 33263
 
5.9%
u 32733
 
5.8%
Other values (11) 151184
26.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 564704
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 53875
 
9.5%
r 53638
 
9.5%
p 50646
 
9.0%
t 45142
 
8.0%
i 41187
 
7.3%
m 36510
 
6.5%
g 33263
 
5.9%
y 33263
 
5.9%
v 33263
 
5.9%
u 32733
 
5.8%
Other values (11) 151184
26.8%

management
Categorical

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
vwc
50624 
wug
8108 
water board
 
3688
wua
 
3118
private operator
 
2504
Other values (7)
6208 

Length

Max length16
Median length3
Mean length4.3611448
Min length3

Characters and Unicode

Total characters323815
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowvwc
2nd rowwug
3rd rowvwc
4th rowvwc
5th rowother

Common Values

ValueCountFrequency (%)
vwc 50624
68.2%
wug 8108
 
10.9%
water board 3688
 
5.0%
wua 3118
 
4.2%
private operator 2504
 
3.4%
parastatal 2229
 
3.0%
water authority 1123
 
1.5%
other 1083
 
1.5%
company 859
 
1.2%
unknown 683
 
0.9%
Other values (2) 231
 
0.3%

Length

2025-04-26T17:27:38.585433image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vwc 50624
61.9%
wug 8108
 
9.9%
water 4811
 
5.9%
board 3688
 
4.5%
wua 3118
 
3.8%
private 2504
 
3.1%
operator 2504
 
3.1%
parastatal 2229
 
2.7%
other 1209
 
1.5%
authority 1123
 
1.4%
Other values (5) 1899
 
2.3%

Most occurring characters

ValueCountFrequency (%)
w 67344
20.8%
v 53128
16.4%
c 51609
15.9%
a 27523
8.5%
r 20677
 
6.4%
t 17942
 
5.5%
u 13137
 
4.1%
o 12822
 
4.0%
e 11028
 
3.4%
g 8108
 
2.5%
Other values (13) 40497
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 323815
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w 67344
20.8%
v 53128
16.4%
c 51609
15.9%
a 27523
8.5%
r 20677
 
6.4%
t 17942
 
5.5%
u 13137
 
4.1%
o 12822
 
4.0%
e 11028
 
3.4%
g 8108
 
2.5%
Other values (13) 40497
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 323815
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w 67344
20.8%
v 53128
16.4%
c 51609
15.9%
a 27523
8.5%
r 20677
 
6.4%
t 17942
 
5.5%
u 13137
 
4.1%
o 12822
 
4.0%
e 11028
 
3.4%
g 8108
 
2.5%
Other values (13) 40497
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 323815
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w 67344
20.8%
v 53128
16.4%
c 51609
15.9%
a 27523
8.5%
r 20677
 
6.4%
t 17942
 
5.5%
u 13137
 
4.1%
o 12822
 
4.0%
e 11028
 
3.4%
g 8108
 
2.5%
Other values (13) 40497
12.5%

management_group
Categorical

Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
user-group
65538 
commercial
 
4591
parastatal
 
2229
other
 
1209
unknown
 
683

Length

Max length10
Median length10
Mean length9.8909899
Min length5

Characters and Unicode

Total characters734406
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowuser-group
2nd rowuser-group
3rd rowuser-group
4th rowuser-group
5th rowother

Common Values

ValueCountFrequency (%)
user-group 65538
88.3%
commercial 4591
 
6.2%
parastatal 2229
 
3.0%
other 1209
 
1.6%
unknown 683
 
0.9%

Length

2025-04-26T17:27:38.608220image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:38.624136image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
user-group 65538
88.3%
commercial 4591
 
6.2%
parastatal 2229
 
3.0%
other 1209
 
1.6%
unknown 683
 
0.9%

Most occurring characters

ValueCountFrequency (%)
r 139105
18.9%
u 131759
17.9%
o 72021
9.8%
e 71338
9.7%
s 67767
9.2%
p 67767
9.2%
- 65538
8.9%
g 65538
8.9%
a 13507
 
1.8%
m 9182
 
1.3%
Other values (8) 30884
 
4.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 734406
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 139105
18.9%
u 131759
17.9%
o 72021
9.8%
e 71338
9.7%
s 67767
9.2%
p 67767
9.2%
- 65538
8.9%
g 65538
8.9%
a 13507
 
1.8%
m 9182
 
1.3%
Other values (8) 30884
 
4.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 734406
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 139105
18.9%
u 131759
17.9%
o 72021
9.8%
e 71338
9.7%
s 67767
9.2%
p 67767
9.2%
- 65538
8.9%
g 65538
8.9%
a 13507
 
1.8%
m 9182
 
1.3%
Other values (8) 30884
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 734406
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 139105
18.9%
u 131759
17.9%
o 72021
9.8%
e 71338
9.7%
s 67767
9.2%
p 67767
9.2%
- 65538
8.9%
g 65538
8.9%
a 13507
 
1.8%
m 9182
 
1.3%
Other values (8) 30884
 
4.2%

payment
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
never pay
31712 
pay per bucket
11266 
pay monthly
10397 
unknown
10149 
pay when scheme fails
4842 
Other values (2)
5884 

Length

Max length21
Median length14
Mean length10.661737
Min length5

Characters and Unicode

Total characters791634
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpay annually
2nd rownever pay
3rd rowpay per bucket
4th rownever pay
5th rownever pay

Common Values

ValueCountFrequency (%)
never pay 31712
42.7%
pay per bucket 11266
 
15.2%
pay monthly 10397
 
14.0%
unknown 10149
 
13.7%
pay when scheme fails 4842
 
6.5%
pay annually 4570
 
6.2%
other 1314
 
1.8%

Length

2025-04-26T17:27:38.646615image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:38.665754image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
pay 62787
39.7%
never 31712
20.1%
per 11266
 
7.1%
bucket 11266
 
7.1%
monthly 10397
 
6.6%
unknown 10149
 
6.4%
when 4842
 
3.1%
scheme 4842
 
3.1%
fails 4842
 
3.1%
annually 4570
 
2.9%

Most occurring characters

ValueCountFrequency (%)
e 101796
12.9%
n 86538
10.9%
83737
10.6%
y 77754
9.8%
a 76769
9.7%
p 74053
9.4%
r 44292
 
5.6%
v 31712
 
4.0%
u 25985
 
3.3%
l 24379
 
3.1%
Other values (11) 164619
20.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 791634
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 101796
12.9%
n 86538
10.9%
83737
10.6%
y 77754
9.8%
a 76769
9.7%
p 74053
9.4%
r 44292
 
5.6%
v 31712
 
4.0%
u 25985
 
3.3%
l 24379
 
3.1%
Other values (11) 164619
20.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 791634
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 101796
12.9%
n 86538
10.9%
83737
10.6%
y 77754
9.8%
a 76769
9.7%
p 74053
9.4%
r 44292
 
5.6%
v 31712
 
4.0%
u 25985
 
3.3%
l 24379
 
3.1%
Other values (11) 164619
20.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 791634
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 101796
12.9%
n 86538
10.9%
83737
10.6%
y 77754
9.8%
a 76769
9.7%
p 74053
9.4%
r 44292
 
5.6%
v 31712
 
4.0%
u 25985
 
3.3%
l 24379
 
3.1%
Other values (11) 164619
20.8%

payment_type
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
never pay
31712 
per bucket
11266 
monthly
10397 
unknown
10149 
on failure
4842 
Other values (2)
5884 

Length

Max length10
Median length9
Mean length8.5311785
Min length5

Characters and Unicode

Total characters633440
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowannually
2nd rownever pay
3rd rowper bucket
4th rownever pay
5th rownever pay

Common Values

ValueCountFrequency (%)
never pay 31712
42.7%
per bucket 11266
 
15.2%
monthly 10397
 
14.0%
unknown 10149
 
13.7%
on failure 4842
 
6.5%
annually 4570
 
6.2%
other 1314
 
1.8%

Length

2025-04-26T17:27:38.691356image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:38.709310image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
never 31712
26.0%
pay 31712
26.0%
per 11266
 
9.2%
bucket 11266
 
9.2%
monthly 10397
 
8.5%
unknown 10149
 
8.3%
on 4842
 
4.0%
failure 4842
 
4.0%
annually 4570
 
3.7%
other 1314
 
1.1%

Most occurring characters

ValueCountFrequency (%)
e 92112
14.5%
n 86538
13.7%
r 49134
 
7.8%
47820
 
7.5%
y 46679
 
7.4%
a 45694
 
7.2%
p 42978
 
6.8%
v 31712
 
5.0%
u 30827
 
4.9%
o 26702
 
4.2%
Other values (10) 133244
21.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 633440
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 92112
14.5%
n 86538
13.7%
r 49134
 
7.8%
47820
 
7.5%
y 46679
 
7.4%
a 45694
 
7.2%
p 42978
 
6.8%
v 31712
 
5.0%
u 30827
 
4.9%
o 26702
 
4.2%
Other values (10) 133244
21.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 633440
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 92112
14.5%
n 86538
13.7%
r 49134
 
7.8%
47820
 
7.5%
y 46679
 
7.4%
a 45694
 
7.2%
p 42978
 
6.8%
v 31712
 
5.0%
u 30827
 
4.9%
o 26702
 
4.2%
Other values (10) 133244
21.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 633440
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 92112
14.5%
n 86538
13.7%
r 49134
 
7.8%
47820
 
7.5%
y 46679
 
7.4%
a 45694
 
7.2%
p 42978
 
6.8%
v 31712
 
5.0%
u 30827
 
4.9%
o 26702
 
4.2%
Other values (10) 133244
21.0%

water_quality
Categorical

Imbalance 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
soft
63505 
salty
 
6082
unknown
 
2345
milky
 
1005
coloured
 
623
Other values (3)
 
690

Length

Max length18
Median length4
Mean length4.3039057
Min length4

Characters and Unicode

Total characters319565
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsoft
2nd rowsoft
3rd rowsoft
4th rowsoft
5th rowsoft

Common Values

ValueCountFrequency (%)
soft 63505
85.5%
salty 6082
 
8.2%
unknown 2345
 
3.2%
milky 1005
 
1.4%
coloured 623
 
0.8%
salty abandoned 423
 
0.6%
fluoride 244
 
0.3%
fluoride abandoned 23
 
< 0.1%

Length

2025-04-26T17:27:38.736153image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:38.755579image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
soft 63505
85.0%
salty 6505
 
8.7%
unknown 2345
 
3.1%
milky 1005
 
1.3%
coloured 623
 
0.8%
abandoned 446
 
0.6%
fluoride 267
 
0.4%

Most occurring characters

ValueCountFrequency (%)
s 70010
21.9%
t 70010
21.9%
o 67809
21.2%
f 63772
20.0%
l 8400
 
2.6%
n 7927
 
2.5%
y 7510
 
2.4%
a 7397
 
2.3%
k 3350
 
1.0%
u 3235
 
1.0%
Other values (9) 10145
 
3.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 319565
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
s 70010
21.9%
t 70010
21.9%
o 67809
21.2%
f 63772
20.0%
l 8400
 
2.6%
n 7927
 
2.5%
y 7510
 
2.4%
a 7397
 
2.3%
k 3350
 
1.0%
u 3235
 
1.0%
Other values (9) 10145
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 319565
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
s 70010
21.9%
t 70010
21.9%
o 67809
21.2%
f 63772
20.0%
l 8400
 
2.6%
n 7927
 
2.5%
y 7510
 
2.4%
a 7397
 
2.3%
k 3350
 
1.0%
u 3235
 
1.0%
Other values (9) 10145
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 319565
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
s 70010
21.9%
t 70010
21.9%
o 67809
21.2%
f 63772
20.0%
l 8400
 
2.6%
n 7927
 
2.5%
y 7510
 
2.4%
a 7397
 
2.3%
k 3350
 
1.0%
u 3235
 
1.0%
Other values (9) 10145
 
3.2%

quality_group
Categorical

Imbalance 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
good
63505 
salty
6505 
unknown
 
2345
milky
 
1005
colored
 
623

Length

Max length8
Median length4
Mean length4.2354478
Min length4

Characters and Unicode

Total characters314482
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgood
2nd rowgood
3rd rowgood
4th rowgood
5th rowgood

Common Values

ValueCountFrequency (%)
good 63505
85.5%
salty 6505
 
8.8%
unknown 2345
 
3.2%
milky 1005
 
1.4%
colored 623
 
0.8%
fluoride 267
 
0.4%

Length

2025-04-26T17:27:38.782334image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:38.965728image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
good 63505
85.5%
salty 6505
 
8.8%
unknown 2345
 
3.2%
milky 1005
 
1.4%
colored 623
 
0.8%
fluoride 267
 
0.4%

Most occurring characters

ValueCountFrequency (%)
o 130868
41.6%
d 64395
20.5%
g 63505
20.2%
l 8400
 
2.7%
y 7510
 
2.4%
n 7035
 
2.2%
t 6505
 
2.1%
a 6505
 
2.1%
s 6505
 
2.1%
k 3350
 
1.1%
Other values (8) 9904
 
3.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 314482
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 130868
41.6%
d 64395
20.5%
g 63505
20.2%
l 8400
 
2.7%
y 7510
 
2.4%
n 7035
 
2.2%
t 6505
 
2.1%
a 6505
 
2.1%
s 6505
 
2.1%
k 3350
 
1.1%
Other values (8) 9904
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 314482
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 130868
41.6%
d 64395
20.5%
g 63505
20.2%
l 8400
 
2.7%
y 7510
 
2.4%
n 7035
 
2.2%
t 6505
 
2.1%
a 6505
 
2.1%
s 6505
 
2.1%
k 3350
 
1.1%
Other values (8) 9904
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 314482
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 130868
41.6%
d 64395
20.5%
g 63505
20.2%
l 8400
 
2.7%
y 7510
 
2.4%
n 7035
 
2.2%
t 6505
 
2.1%
a 6505
 
2.1%
s 6505
 
2.1%
k 3350
 
1.1%
Other values (8) 9904
 
3.1%

quantity
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
enough
41522 
insufficient
18896 
dry
7782 
seasonal
5075 
unknown
 
975

Length

Max length12
Median length6
Mean length7.3623569
Min length3

Characters and Unicode

Total characters546655
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowenough
2nd rowinsufficient
3rd rowenough
4th rowdry
5th rowseasonal

Common Values

ValueCountFrequency (%)
enough 41522
55.9%
insufficient 18896
25.4%
dry 7782
 
10.5%
seasonal 5075
 
6.8%
unknown 975
 
1.3%

Length

2025-04-26T17:27:38.987138image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:39.002915image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
enough 41522
55.9%
insufficient 18896
25.4%
dry 7782
 
10.5%
seasonal 5075
 
6.8%
unknown 975
 
1.3%

Most occurring characters

ValueCountFrequency (%)
n 87314
16.0%
e 65493
12.0%
u 61393
11.2%
i 56688
10.4%
o 47572
8.7%
g 41522
7.6%
h 41522
7.6%
f 37792
6.9%
s 29046
 
5.3%
t 18896
 
3.5%
Other values (8) 59417
10.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 546655
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 87314
16.0%
e 65493
12.0%
u 61393
11.2%
i 56688
10.4%
o 47572
8.7%
g 41522
7.6%
h 41522
7.6%
f 37792
6.9%
s 29046
 
5.3%
t 18896
 
3.5%
Other values (8) 59417
10.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 546655
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 87314
16.0%
e 65493
12.0%
u 61393
11.2%
i 56688
10.4%
o 47572
8.7%
g 41522
7.6%
h 41522
7.6%
f 37792
6.9%
s 29046
 
5.3%
t 18896
 
3.5%
Other values (8) 59417
10.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 546655
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 87314
16.0%
e 65493
12.0%
u 61393
11.2%
i 56688
10.4%
o 47572
8.7%
g 41522
7.6%
h 41522
7.6%
f 37792
6.9%
s 29046
 
5.3%
t 18896
 
3.5%
Other values (8) 59417
10.9%

quantity_group
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
enough
41522 
insufficient
18896 
dry
7782 
seasonal
5075 
unknown
 
975

Length

Max length12
Median length6
Mean length7.3623569
Min length3

Characters and Unicode

Total characters546655
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowenough
2nd rowinsufficient
3rd rowenough
4th rowdry
5th rowseasonal

Common Values

ValueCountFrequency (%)
enough 41522
55.9%
insufficient 18896
25.4%
dry 7782
 
10.5%
seasonal 5075
 
6.8%
unknown 975
 
1.3%

Length

2025-04-26T17:27:39.023575image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:39.039287image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
enough 41522
55.9%
insufficient 18896
25.4%
dry 7782
 
10.5%
seasonal 5075
 
6.8%
unknown 975
 
1.3%

Most occurring characters

ValueCountFrequency (%)
n 87314
16.0%
e 65493
12.0%
u 61393
11.2%
i 56688
10.4%
o 47572
8.7%
g 41522
7.6%
h 41522
7.6%
f 37792
6.9%
s 29046
 
5.3%
t 18896
 
3.5%
Other values (8) 59417
10.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 546655
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 87314
16.0%
e 65493
12.0%
u 61393
11.2%
i 56688
10.4%
o 47572
8.7%
g 41522
7.6%
h 41522
7.6%
f 37792
6.9%
s 29046
 
5.3%
t 18896
 
3.5%
Other values (8) 59417
10.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 546655
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 87314
16.0%
e 65493
12.0%
u 61393
11.2%
i 56688
10.4%
o 47572
8.7%
g 41522
7.6%
h 41522
7.6%
f 37792
6.9%
s 29046
 
5.3%
t 18896
 
3.5%
Other values (8) 59417
10.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 546655
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 87314
16.0%
e 65493
12.0%
u 61393
11.2%
i 56688
10.4%
o 47572
8.7%
g 41522
7.6%
h 41522
7.6%
f 37792
6.9%
s 29046
 
5.3%
t 18896
 
3.5%
Other values (8) 59417
10.9%

source
Categorical

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
spring
21216 
shallow well
21140 
machine dbh
13822 
river
11964 
rainwater harvesting
2863 
Other values (5)
3245 

Length

Max length20
Median length12
Mean length8.9857104
Min length3

Characters and Unicode

Total characters667189
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowrainwater harvesting
3rd rowdam
4th rowmachine dbh
5th rowrainwater harvesting

Common Values

ValueCountFrequency (%)
spring 21216
28.6%
shallow well 21140
28.5%
machine dbh 13822
18.6%
river 11964
16.1%
rainwater harvesting 2863
 
3.9%
hand dtw 1108
 
1.5%
lake 950
 
1.3%
dam 840
 
1.1%
other 261
 
0.4%
unknown 86
 
0.1%

Length

2025-04-26T17:27:39.063778image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:39.086212image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
spring 21216
18.7%
shallow 21140
18.7%
well 21140
18.7%
machine 13822
12.2%
dbh 13822
12.2%
river 11964
10.6%
rainwater 2863
 
2.5%
harvesting 2863
 
2.5%
hand 1108
 
1.0%
dtw 1108
 
1.0%
Other values (4) 2137
 
1.9%

Most occurring characters

ValueCountFrequency (%)
l 85510
12.8%
r 53994
 
8.1%
e 53863
 
8.1%
h 53016
 
7.9%
i 52728
 
7.9%
a 46449
 
7.0%
w 46337
 
6.9%
s 45219
 
6.8%
n 42130
 
6.3%
38933
 
5.8%
Other values (11) 149010
22.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 667189
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 85510
12.8%
r 53994
 
8.1%
e 53863
 
8.1%
h 53016
 
7.9%
i 52728
 
7.9%
a 46449
 
7.0%
w 46337
 
6.9%
s 45219
 
6.8%
n 42130
 
6.3%
38933
 
5.8%
Other values (11) 149010
22.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 667189
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 85510
12.8%
r 53994
 
8.1%
e 53863
 
8.1%
h 53016
 
7.9%
i 52728
 
7.9%
a 46449
 
7.0%
w 46337
 
6.9%
s 45219
 
6.8%
n 42130
 
6.3%
38933
 
5.8%
Other values (11) 149010
22.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 667189
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 85510
12.8%
r 53994
 
8.1%
e 53863
 
8.1%
h 53016
 
7.9%
i 52728
 
7.9%
a 46449
 
7.0%
w 46337
 
6.9%
s 45219
 
6.8%
n 42130
 
6.3%
38933
 
5.8%
Other values (11) 149010
22.3%

source_type
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
spring
21216 
shallow well
21140 
borehole
14930 
river/lake
12914 
rainwater harvesting
2863 
Other values (2)
 
1187

Length

Max length20
Median length12
Mean length9.3073535
Min length3

Characters and Unicode

Total characters691071
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowrainwater harvesting
3rd rowdam
4th rowborehole
5th rowrainwater harvesting

Common Values

ValueCountFrequency (%)
spring 21216
28.6%
shallow well 21140
28.5%
borehole 14930
20.1%
river/lake 12914
17.4%
rainwater harvesting 2863
 
3.9%
dam 840
 
1.1%
other 347
 
0.5%

Length

2025-04-26T17:27:39.117835image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:39.138701image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
spring 21216
21.6%
shallow 21140
21.5%
well 21140
21.5%
borehole 14930
15.2%
river/lake 12914
13.1%
rainwater 2863
 
2.9%
harvesting 2863
 
2.9%
dam 840
 
0.9%
other 347
 
0.4%

Most occurring characters

ValueCountFrequency (%)
l 112404
16.3%
e 82901
12.0%
r 70910
10.3%
o 51347
 
7.4%
s 45219
 
6.5%
w 45143
 
6.5%
a 43483
 
6.3%
i 39856
 
5.8%
h 39280
 
5.7%
n 26942
 
3.9%
Other values (10) 133586
19.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 691071
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 112404
16.3%
e 82901
12.0%
r 70910
10.3%
o 51347
 
7.4%
s 45219
 
6.5%
w 45143
 
6.5%
a 43483
 
6.3%
i 39856
 
5.8%
h 39280
 
5.7%
n 26942
 
3.9%
Other values (10) 133586
19.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 691071
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 112404
16.3%
e 82901
12.0%
r 70910
10.3%
o 51347
 
7.4%
s 45219
 
6.5%
w 45143
 
6.5%
a 43483
 
6.3%
i 39856
 
5.8%
h 39280
 
5.7%
n 26942
 
3.9%
Other values (10) 133586
19.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 691071
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 112404
16.3%
e 82901
12.0%
r 70910
10.3%
o 51347
 
7.4%
s 45219
 
6.5%
w 45143
 
6.5%
a 43483
 
6.3%
i 39856
 
5.8%
h 39280
 
5.7%
n 26942
 
3.9%
Other values (10) 133586
19.3%

source_class
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
groundwater
57286 
surface
16617 
unknown
 
347

Length

Max length11
Median length11
Mean length10.086114
Min length7

Characters and Unicode

Total characters748894
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgroundwater
2nd rowsurface
3rd rowsurface
4th rowgroundwater
5th rowsurface

Common Values

ValueCountFrequency (%)
groundwater 57286
77.2%
surface 16617
 
22.4%
unknown 347
 
0.5%

Length

2025-04-26T17:27:39.166947image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:39.183407image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
groundwater 57286
77.2%
surface 16617
 
22.4%
unknown 347
 
0.5%

Most occurring characters

ValueCountFrequency (%)
r 131189
17.5%
u 74250
9.9%
a 73903
9.9%
e 73903
9.9%
n 58327
7.8%
o 57633
7.7%
w 57633
7.7%
g 57286
7.6%
d 57286
7.6%
t 57286
7.6%
Other values (4) 50198
 
6.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 748894
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 131189
17.5%
u 74250
9.9%
a 73903
9.9%
e 73903
9.9%
n 58327
7.8%
o 57633
7.7%
w 57633
7.7%
g 57286
7.6%
d 57286
7.6%
t 57286
7.6%
Other values (4) 50198
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 748894
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 131189
17.5%
u 74250
9.9%
a 73903
9.9%
e 73903
9.9%
n 58327
7.8%
o 57633
7.7%
w 57633
7.7%
g 57286
7.6%
d 57286
7.6%
t 57286
7.6%
Other values (4) 50198
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 748894
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 131189
17.5%
u 74250
9.9%
a 73903
9.9%
e 73903
9.9%
n 58327
7.8%
o 57633
7.7%
w 57633
7.7%
g 57286
7.6%
d 57286
7.6%
t 57286
7.6%
Other values (4) 50198
 
6.7%

waterpoint_type
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
communal standpipe
35628 
hand pump
21884 
other
8010 
communal standpipe multiple
7611 
improved spring
 
959
Other values (2)
 
158

Length

Max length27
Median length18
Mean length14.817051
Min length3

Characters and Unicode

Total characters1100166
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowcommunal standpipe
3rd rowcommunal standpipe multiple
4th rowcommunal standpipe multiple
5th rowcommunal standpipe

Common Values

ValueCountFrequency (%)
communal standpipe 35628
48.0%
hand pump 21884
29.5%
other 8010
 
10.8%
communal standpipe multiple 7611
 
10.3%
improved spring 959
 
1.3%
cattle trough 150
 
0.2%
dam 8
 
< 0.1%

Length

2025-04-26T17:27:39.203357image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:39.221888image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
communal 43239
29.2%
standpipe 43239
29.2%
hand 21884
14.8%
pump 21884
14.8%
other 8010
 
5.4%
multiple 7611
 
5.1%
improved 959
 
0.6%
spring 959
 
0.6%
cattle 150
 
0.1%
trough 150
 
0.1%

Most occurring characters

ValueCountFrequency (%)
p 139775
12.7%
m 116940
10.6%
n 109321
9.9%
a 108520
9.9%
73843
 
6.7%
u 72884
 
6.6%
d 66090
 
6.0%
e 59969
 
5.5%
t 59310
 
5.4%
l 58611
 
5.3%
Other values (8) 234903
21.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1100166
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
p 139775
12.7%
m 116940
10.6%
n 109321
9.9%
a 108520
9.9%
73843
 
6.7%
u 72884
 
6.6%
d 66090
 
6.0%
e 59969
 
5.5%
t 59310
 
5.4%
l 58611
 
5.3%
Other values (8) 234903
21.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1100166
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
p 139775
12.7%
m 116940
10.6%
n 109321
9.9%
a 108520
9.9%
73843
 
6.7%
u 72884
 
6.6%
d 66090
 
6.0%
e 59969
 
5.5%
t 59310
 
5.4%
l 58611
 
5.3%
Other values (8) 234903
21.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1100166
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
p 139775
12.7%
m 116940
10.6%
n 109321
9.9%
a 108520
9.9%
73843
 
6.7%
u 72884
 
6.6%
d 66090
 
6.0%
e 59969
 
5.5%
t 59310
 
5.4%
l 58611
 
5.3%
Other values (8) 234903
21.4%
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
communal standpipe
43239 
hand pump
21884 
other
8010 
improved spring
 
959
cattle trough
 
150

Length

Max length18
Median length18
Mean length13.894505
Min length3

Characters and Unicode

Total characters1031667
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowcommunal standpipe
3rd rowcommunal standpipe
4th rowcommunal standpipe
5th rowcommunal standpipe

Common Values

ValueCountFrequency (%)
communal standpipe 43239
58.2%
hand pump 21884
29.5%
other 8010
 
10.8%
improved spring 959
 
1.3%
cattle trough 150
 
0.2%
dam 8
 
< 0.1%

Length

2025-04-26T17:27:39.248462image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:39.265447image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
communal 43239
30.8%
standpipe 43239
30.8%
hand 21884
15.6%
pump 21884
15.6%
other 8010
 
5.7%
improved 959
 
0.7%
spring 959
 
0.7%
cattle 150
 
0.1%
trough 150
 
0.1%
dam 8
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
p 132164
12.8%
m 109329
10.6%
n 109321
10.6%
a 108520
10.5%
66232
 
6.4%
d 66090
 
6.4%
u 65273
 
6.3%
e 52358
 
5.1%
o 52358
 
5.1%
t 51699
 
5.0%
Other values (8) 218323
21.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1031667
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
p 132164
12.8%
m 109329
10.6%
n 109321
10.6%
a 108520
10.5%
66232
 
6.4%
d 66090
 
6.4%
u 65273
 
6.3%
e 52358
 
5.1%
o 52358
 
5.1%
t 51699
 
5.0%
Other values (8) 218323
21.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1031667
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
p 132164
12.8%
m 109329
10.6%
n 109321
10.6%
a 108520
10.5%
66232
 
6.4%
d 66090
 
6.4%
u 65273
 
6.3%
e 52358
 
5.1%
o 52358
 
5.1%
t 51699
 
5.0%
Other values (8) 218323
21.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1031667
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
p 132164
12.8%
m 109329
10.6%
n 109321
10.6%
a 108520
10.5%
66232
 
6.4%
d 66090
 
6.4%
u 65273
 
6.3%
e 52358
 
5.1%
o 52358
 
5.1%
t 51699
 
5.0%
Other values (8) 218323
21.2%

original_file
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
train
59400 
test
14850 

Length

Max length5
Median length5
Mean length4.8
Min length4

Characters and Unicode

Total characters356400
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowtrain
2nd rowtrain
3rd rowtrain
4th rowtrain
5th rowtrain

Common Values

ValueCountFrequency (%)
train 59400
80.0%
test 14850
 
20.0%

Length

2025-04-26T17:27:39.289049image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-26T17:27:39.302703image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
train 59400
80.0%
test 14850
 
20.0%

Most occurring characters

ValueCountFrequency (%)
t 89100
25.0%
r 59400
16.7%
a 59400
16.7%
i 59400
16.7%
n 59400
16.7%
e 14850
 
4.2%
s 14850
 
4.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 356400
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 89100
25.0%
r 59400
16.7%
a 59400
16.7%
i 59400
16.7%
n 59400
16.7%
e 14850
 
4.2%
s 14850
 
4.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 356400
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 89100
25.0%
r 59400
16.7%
a 59400
16.7%
i 59400
16.7%
n 59400
16.7%
e 14850
 
4.2%
s 14850
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 356400
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 89100
25.0%
r 59400
16.7%
a 59400
16.7%
i 59400
16.7%
n 59400
16.7%
e 14850
 
4.2%
s 14850
 
4.2%

Interactions

2025-04-26T17:27:35.482448image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.030745image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.384767image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.635845image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.875175image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.127383image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.372667image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.738478image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.979762image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.237903image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.508129image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.056310image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.410266image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.659840image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.900691image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.152449image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.397517image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.763059image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.006311image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.263549image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.535252image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.083470image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.435329image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.684489image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.926454image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.178989image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.540147image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.787177image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.032359image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.288712image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.559271image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.107850image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.459082image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.706559image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.950548image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.202635image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.563780image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.810341image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.057370image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.312692image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.585011image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.133209image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.484644image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.731087image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.976191image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.227698image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.589178image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.834033image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.082881image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.337316image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.609647image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.256947image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.509092image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.754372image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.000689image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.250536image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.613701image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.857587image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.108937image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.360628image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.635356image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.283048image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.534612image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.778666image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.026269image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.275386image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.638059image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.881949image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.134862image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.385298image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.659229image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.307980image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.558324image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.801507image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.049811image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.298325image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.662441image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.904312image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.159590image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.408616image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.685150image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.333786image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.583872image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.826274image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.075708image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.323371image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.687777image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.930302image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.184941image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.433684image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.710345image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.358771image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.608420image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:33.849684image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.101035image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.347422image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.712795image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:34.955114image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.211086image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-04-26T17:27:35.457078image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Missing values

2025-04-26T17:27:35.791937image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-04-26T17:27:35.942060image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-04-26T17:27:36.262018image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_grouporiginal_file
0695726000.02011-03-14Roman1390Roman34.938093-9.856322none0Lake NyasaMnyusi BIringa115LudewaMundindi109TrueGeoData Consultants LtdVWCRomanFalse1999gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipetrain
187760.02013-03-06Grumeti1399GRUMETI34.698766-2.147466Zahanati0Lake VictoriaNyamaraMara202SerengetiNatta280NaNGeoData Consultants LtdOtherNaNTrue2010gravitygravitygravitywuguser-groupnever paynever paysoftgoodinsufficientinsufficientrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipetrain
23431025.02013-02-25Lottery Club686World vision37.460664-3.821329Kwa Mahundi0PanganiMajengoManyara214SimanjiroNgorika250TrueGeoData Consultants LtdVWCNyumba ya mungu pipe schemeTrue2009gravitygravitygravityvwcuser-grouppay per bucketper bucketsoftgoodenoughenoughdamdamsurfacecommunal standpipe multiplecommunal standpipetrain
3677430.02013-01-28Unicef263UNICEF38.486161-11.155298Zahanati Ya Nanyumbu0Ruvuma / Southern CoastMahakamaniMtwara9063NanyumbuNanyumbu58TrueGeoData Consultants LtdVWCNaNTrue1986submersiblesubmersiblesubmersiblevwcuser-groupnever paynever paysoftgooddrydrymachine dbhboreholegroundwatercommunal standpipe multiplecommunal standpipetrain
4197280.02011-07-13Action In A0Artisan31.130847-1.825359Shuleni0Lake VictoriaKyanyamisaKagera181KaragweNyakasimbi0TrueGeoData Consultants LtdNaNNaNTrue0gravitygravitygravityotherothernever paynever paysoftgoodseasonalseasonalrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipetrain
5994420.02011-03-13Mkinga Distric Coun0DWE39.172796-4.765587Tajiri0PanganiMoa/MweremeTanga48MkingaMoa1TrueGeoData Consultants LtdVWCZingibaliTrue2009submersiblesubmersiblesubmersiblevwcuser-grouppay per bucketper bucketsaltysaltyenoughenoughotherotherunknowncommunal standpipe multiplecommunal standpipetrain
6198160.02012-10-01Dwsp0DWSP33.362410-3.766365Kwa Ngomho0InternalIshinabulandiShinyanga173Shinyanga RuralSamuye0TrueGeoData Consultants LtdVWCNaNTrue0swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodenoughenoughmachine dbhboreholegroundwaterhand pumphand pumptrain
7545510.02012-10-09Rwssp0DWE32.620617-4.226198Tushirikiane0Lake TanganyikaNyawishi CenterShinyanga173KahamaChambo0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpwuguser-groupunknownunknownmilkymilkyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumptrain
8539340.02012-11-03Wateraid0Water Aid32.711100-5.146712Kwa Ramadhan Musa0Lake TanganyikaImalaudukiTabora146Tabora UrbanItetemia0TrueGeoData Consultants LtdVWCNaNTrue0india mark iiindia mark iihandpumpvwcuser-groupnever paynever paysaltysaltyseasonalseasonalmachine dbhboreholegroundwaterhand pumphand pumptrain
9461440.02011-08-03Isingiro Ho0Artisan30.626991-1.257051Kwapeto0Lake VictoriaMkonomreKagera181KaragweKaisho0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumptrain
idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_grouporiginal_file
14840597570.02013-02-24Villagers1291Villagers35.345384-9.831170e+00Kwa Reonard0Lake NyasaTulianiRuvuma102Songea RuralWino0TrueGeoData Consultants LtdVWCMradi wa maji wa winoTrue2009gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipetest
14841645790.02012-10-26Dwsp0DWE0.000000-2.000000e-08Iguna0Lake VictoriaNyerereShinyanga171BariadiKasoli0NaNGeoData Consultants LtdWUGNaNFalse0swn 80swn 80handpumpwuguser-groupunknownunknownsoftgoodenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumptest
1484257731600.02013-01-27Isf808DWE29.740224-4.882705e+00Hongera0Lake TanganyikaMzizini AKigoma163Kigoma RuralSimbo230TrueGeoData Consultants LtdWUGMkongoro TwoTrue2009gravitygravitygravityvwcuser-grouppay monthlymonthlysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipe multiplecommunal standpipetest
14843655410.02013-02-04Oxfarm1641OXFARM29.768139-4.480618e+00Mwandami0Lake TanganyikaKosoroKigoma162Kigoma RuralMkigo1400TrueGeoData Consultants LtdWater authorityNaNFalse1995otherotherothervwcuser-groupnever paynever paysoftgoodenoughenoughspringspringgroundwaterotherothertest
14844681740.02012-11-07Netherlands0DWE34.096878-3.079689e+00Ikanayugu0Lake VictoriaMaganjuShinyanga172MaswaIpililo0TrueGeoData Consultants LtdWUGNaNFalse0nira/taniranira/tanirahandpumpwuguser-groupotherothersoftgoodenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumptest
14845393070.02011-02-24Danida34Da38.852669-6.582841e+00Kwambwezi0Wami / RuvuYomboPwani61BagamoyoYombo20TrueGeoData Consultants LtdVWCBagamoyo wateTrue1988monomonomotorpumpvwcuser-groupnever paynever paysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipetest
14846189901000.02011-03-21Hiap0HIAP37.451633-5.350428e+00Bonde La Mkondoa0PanganiMkondoaTanga47KilindiMvungwe2960TrueGeoData Consultants LtdVWCNaNFalse1994nira/taniranira/tanirahandpumpvwcuser-grouppay annuallyannuallysaltysaltyinsufficientinsufficientshallow wellshallow wellgroundwaterhand pumphand pumptest
14847287490.02013-03-04NaN1476NaN34.739804-4.585587e+00Bwawani0InternalJuhudiSingida132Singida RuralUghandi200TrueGeoData Consultants LtdVWCNaNNaN2010gravitygravitygravityvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientdamdamsurfacecommunal standpipecommunal standpipetest
14848334920.02013-02-18Germany998DWE35.432732-1.058416e+01Kwa John0Lake NyasaNamakinga BRuvuma102Songea RuralMaposeni150TrueGeoData Consultants LtdVWCMradi wa maji wa maposeniTrue2009gravitygravitygravityvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientriverriver/lakesurfacecommunal standpipecommunal standpipetest
14849687070.02013-02-13Government Of Tanzania481Government34.765054-1.122601e+01Kwa Mzee Chagala0Lake NyasaKambaRuvuma103MbingaMbamba bay40TrueGeoData Consultants LtdVWCDANIDATrue2008gravitygravitygravityvwcuser-groupnever paynever paysoftgooddrydryspringspringgroundwatercommunal standpipecommunal standpipetest